Set initial number of tasks for scaled writer with HBO#20901
Set initial number of tasks for scaled writer with HBO#20901feilong-liu merged 2 commits intoprestodb:masterfrom
Conversation
3c0ba5b to
f8efc49
Compare
f8efc49 to
214c542
Compare
There was a problem hiding this comment.
Get the number of tasks for the stage, and record it.
There was a problem hiding this comment.
Add table writer stats to estimate
presto-main/src/main/java/com/facebook/presto/sql/planner/planPrinter/TextRenderer.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Add table writer node statistics to plan statistics
There was a problem hiding this comment.
Add a field which specify the number of tasks to start from for scaled writer
There was a problem hiding this comment.
Start from 1 if no initial task number specified
There was a problem hiding this comment.
Get the suggested number of table writer tasks from query plan, it finds the TableWriterNode, and read from its task number if scale writer field.
There was a problem hiding this comment.
Add a rule to set the initial number of tasks for a table writer
presto-main/src/main/java/com/facebook/presto/sql/planner/planPrinter/PlanPrinter.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Add a field to specify the number of tasks to begin with if it's a scaled writer
0660644 to
70ad759
Compare
There was a problem hiding this comment.
Get the preferred number of tasks from table writer nodes in the plan
There was a problem hiding this comment.
is TableWriterMergeNode relevant here?
There was a problem hiding this comment.
No, the scaled writer optimization is only related to table writer node, not related to table writer merger node.
presto-main/src/main/java/com/facebook/presto/sql/planner/iterative/rule/ScaledWriterRule.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/com/facebook/presto/sql/planner/plan/TableWriterNode.java
Outdated
Show resolved
Hide resolved
presto-spi/src/main/java/com/facebook/presto/spi/statistics/TableWriterNodeStatistics.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/com/facebook/presto/cost/TableWriterNodeStatsEstimate.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/com/facebook/presto/execution/scheduler/ScaledWriterScheduler.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/com/facebook/presto/sql/planner/planPrinter/PlanPrinter.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/com/facebook/presto/sql/planner/planPrinter/TextRenderer.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/com/facebook/presto/SystemSessionProperties.java
Outdated
Show resolved
Hide resolved
dc63547 to
af6430c
Compare
af6430c to
9e88b78
Compare
There was a problem hiding this comment.
I just merged #20990 that tracks which optimizers were cost-based and the source of stats used (CBO/HBO).
Can you override functions isCostBased and getStatsSource so this optimizer also gets tracked?
9e88b78 to
47d69a3
Compare
47d69a3 to
b3f5944
Compare
Description
Addresses #20355
Record the number of tasks used in scaled writers in HBO, and use HBO to set the initial number of writers to begin with for scaled writers.
Motivation and Context
Scaled writers first have only 1 task to write data out, and increase the number of tasks as needed when the source is throttled. In this PR, the scaled writer will start with a number based on the number of previous runs, so that it can have larger parallelism in the beginning and hence improve latency.
Impact
Latency improvement for scaled writer pipelines
Test Plan
Test query
Also run with verifier suite
Contributor checklist
Release Notes
Please follow release notes guidelines and fill in the release notes below.